TTS-Guided Training for Accent Conversion Without Parallel Data
نویسندگان
چکیده
Accent Conversion (AC) seeks to change the accent of speech from one (source) another (target) while preserving content and speaker identity. However, many existing AC approaches rely on source-target parallel data during training or reference at run-time. We propose a novel conversion framework without need for either speech. Specifically, text-to-speech (TTS) system is first pretrained with target-accented data. This TTS model its hidden representations are expected be associated only target accent. Then, encoder trained convert under supervision model. In doing so, source-accented corresponding transcription forwarded TTS, respectively. The output optimized same as text embedding in system. At run-time, combined decoder toward target. experiments, we converted English two source accents (Chinese/Indian) (American/British/Canadian). Both objective metrics subjective listening tests successfully validate that proposed approach generates samples close high quality.
منابع مشابه
Statistical machine translation without long parallel sentences for training data
In this study, we paid attention to the reliability of phrase table. We have been used the phrase table using Och’s method[2]. And this method sometimes generate completely wrong phrase tables. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translat...
متن کاملForeign accent conversion in computer assisted pronunciation training
Learners of a second language practice their pronunciation by listening to and imitating utterances from native speakers. Recent research has shown that choosing a well-matched native speaker to imitate can have a positive impact on pronunciation training. Here we propose a voice-transformation technique that can be used to generate the (arguably) ideal voice to imitate: the own voice of the le...
متن کاملMap-based adaptation for speech conversion using adaptation data selection and non-parallel training
This study presents an approach to GMM-based speech conversion using maximum a posteriori probability (MAP) adaptation. First, a conversion function is trained using a parallel corpus containing the same utterances spoken by both the source and the reference speakers. Then a non-parallel corpus from a new target speaker is used for the adaptation of the conversion function which models the voic...
متن کاملA KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences
We extend our recently proposed approach to cross-lingual TTS training to voice conversion, without using parallel training sentences. It employs Speaker Independent, Deep Neural Net (SIDNN) ASR to equalize the difference between source and target speakers and Kullback-Leibler Divergence (KLD) to convert spectral parameters probabilistically in the phonetic space via ASR senone posterior probab...
متن کاملMLLR-based accent model adaptation without accented data
When the user has an accent different from what the automatic speech recognization system is trained with, the performance of the systems degrades. This is attributed to both acoustic and phonological differences between accents. The phonological differences between two accents are due to different phoneme inventories in two languages. Even for the same phoneme, foreigners and native speakers p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Signal Processing Letters
سال: 2023
ISSN: ['1558-2361', '1070-9908']
DOI: https://doi.org/10.1109/lsp.2023.3270079